40 research outputs found
What does fault tolerant Deep Learning need from MPI?
Deep Learning (DL) algorithms have become the de facto Machine Learning (ML)
algorithm for large scale data analysis. DL algorithms are computationally
expensive - even distributed DL implementations which use MPI require days of
training (model learning) time on commonly studied datasets. Long running DL
applications become susceptible to faults - requiring development of a fault
tolerant system infrastructure, in addition to fault tolerant DL algorithms.
This raises an important question: What is needed from MPI for de- signing
fault tolerant DL implementations? In this paper, we address this problem for
permanent faults. We motivate the need for a fault tolerant MPI specification
by an in-depth consideration of recent innovations in DL algorithms and their
properties, which drive the need for specific fault tolerance features. We
present an in-depth discussion on the suitability of different parallelism
types (model, data and hybrid); a need (or lack thereof) for check-pointing of
any critical data structures; and most importantly, consideration for several
fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI
and their applicability to fault tolerant DL implementations. We leverage a
distributed memory implementation of Caffe, currently available under the
Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches
by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation
using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies
demonstrates the effectiveness of the proposed fault tolerant DL implementation
using OpenMPI based ULFM
Whole Proteome Clustering of 2,307 Proteobacterial Genomes Reveals Conserved Proteins and Significant Annotation Issues
We clustered 8.76 M protein sequences deduced from 2,307 completely sequenced Proteobacterial genomes resulting in 707,311 clusters of one or more sequences of which 224,442 ranged in size from 2 to 2,894 sequences. To our knowledge this is the first study of this scale. We were surprised to find that no single cluster contained a representative sequence from all the organisms in the study. Given the minimal genome concept, we expected to find a shared set of proteins. To determine why the clusters did not have universal representation we chose four essential proteins, the chaperonin GroEL, DNA dependent RNA polymerase subunits beta and beta′ (RpoB/RpoB′), and DNA polymerase I (PolA), representing fundamental cellular functions, and examined their cluster distribution. We found these proteins to be remarkably conserved with certain caveats. Although the groEL gene was universally conserved in all the organisms in the study, the protein was not represented in all the deduced proteomes. The genes for RpoB and RpoB′ were missing from two genomes and merged in 88, and the sequences were sufficiently divergent that they formed separate clusters for 18 RpoB proteins (seven clusters) and 14 RpoB′ proteins (three clusters). For PolA, 52 organisms lacked an identifiable sequence, and seven sequences were sufficiently divergent that they formed five separate clusters. Interestingly, organisms lacking an identifiable PolA and those with divergent RpoB/RpoB′ were predominantly endosymbionts. Furthermore, we present a range of examples of annotation issues that caused the deduced proteins to be incorrectly represented in the proteome. These annotation issues made our task of determining protein conservation more difficult than expected and also represent a significant obstacle for high-throughput analyses
Learning from Experience in NSW?
While the bulk of the empirical evidence shows that municipal mergers do not improve the performance of local authorities, Australian policy-makers nonetheless continue to impose council amalgamation, as illustrated by the current New South Wales 'Fit for the Future' local government reform process. This paper first critically examines the empirical evidence employed by the Independent Local Government Review Panel on the impact of the 2004 council mergers. We argue that this evidence is flawed. We then provide an empirical assessment of the municipal mergers, which occurred over 2000-2004 with our sample drawn from Group 4 councils in the New South Wales variant of the Australian Local Government Classification System. Group 4 councils represent a group of significant regional cities and town councils with similar operational activities. We demonstrate that merged councils have not performed any better than their unmerged peers over the period 2004 to 2014. The paper concludes with some brief policy implications for local government reform in New South Wales and elsewhere
Spatial and Temporal Trends of Global Pollination Benefit
Pollination is a well-studied and at the same time a threatened ecosystem service. A significant part of global crop production depends on or profits from pollination by animals. Using detailed information on global crop yields of 60 pollination dependent or profiting crops, we provide a map of global pollination benefits on a 5′ by 5′ latitude-longitude grid. The current spatial pattern of pollination benefits is only partly correlated with climate variables and the distribution of cropland. The resulting map of pollination benefits identifies hot spots of pollination benefits at sufficient detail to guide political decisions on where to protect pollination services by investing in structural diversity of land use. Additionally, we investigated the vulnerability of the national economies with respect to potential decline of pollination services as the portion of the (agricultural) economy depending on pollination benefits. While the general dependency of the agricultural economy on pollination seems to be stable from 1993 until 2009, we see increases in producer prices for pollination dependent crops, which we interpret as an early warning signal for a conflict between pollination service and other land uses at the global scale. Our spatially explicit analysis of global pollination benefit points to hot spots for the generation of pollination benefits and can serve as a base for further planning of land use, protection sites and agricultural policies for maintaining pollination services
Recommended from our members
Whole proteome clustering of 2,307 Proteobacterial genomes reveals conserved proteins and significant annotation issues
We clustered 8.76 M protein sequences deduced from 2,307 completely sequenced Proteobacterial genomes resulting in 707,311 clusters of one or more sequences of which 224,442 ranged in size from 2 to 2,894 sequences. To our knowledge this is the first study of this scale. We were surprised to find that no single cluster contained a representative sequence from all the organisms in the study. Given the minimal genome concept, we expected to find a shared set of proteins. To determine why the clusters did not have universal representation we chose four essential proteins, the chaperonin GroEL, DNA dependent RNA polymerase subunits beta and beta′ (RpoB/RpoB′), and DNA polymerase I (PolA), representing fundamental cellular functions, and examined their cluster distribution. We found these proteins to be remarkably conserved with certain caveats. Although the groEL gene was universally conserved in all the organisms in the study, the protein was not represented in all the deduced proteomes. The genes for RpoB and RpoB′ were missing from two genomes and merged in 88, and the sequences were sufficiently divergent that they formed separate clusters for 18 RpoB proteins (seven clusters) and 14 RpoB′ proteins (three clusters). For PolA, 52 organisms lacked an identifiable sequence, and seven sequences were sufficiently divergent that they formed five separate clusters. Interestingly, organisms lacking an identifiable PolA and those with divergent RpoB/RpoB′ were predominantly endosymbionts. Furthermore, we present a range of examples of annotation issues that caused the deduced proteins to be incorrectly represented in the proteome. These annotation issues made our task of determining protein conservation more difficult than expected and also represent a significant obstacle for high-throughput analyses.Published cop